Sentinel-2 Vegetation Phenology¶

This notebook calculates vegetation phenology changes using Sentinel-2 data. To detect changes, the algorithm uses Normalized Difference Vegetation Index (NDVI) which is a common proxy for vegetation growth and health. The outputs of this notebook can be used to assess differences in agriculture fields over time or space and also allow the assessment of growing states such as planting and harvesting.

Load Data Cube Configuration and Import Utilities¶

In [1]:
import xarray as xr
import numpy as np  
import matplotlib.pyplot as plt

import datacube

import sys, os
os.environ['USE_PYGEOS'] = '0'

from dea_tools.plotting import display_map

### EASI tools
sys.path.append(os.path.expanduser('../scripts'))
from ceos_utils.data_cube_utilities.clean_mask import landsat_clean_mask_invalid, landsat_qa_clean_mask
from easi_tools import EasiDefaults
from easi_tools import notebook_utils
easi = EasiDefaults() # Get the default parameters for this system
Successfully found configuration for deployment "eail"
In [2]:
cluster, client = notebook_utils.initialize_dask(use_gateway=False)
display(cluster if cluster else client)
print(notebook_utils.localcluster_dashboard(client, server=easi.hub))

LocalCluster

fd25c130

Dashboard: http://127.0.0.1:8787/status Workers: 4
Total threads: 8 Total memory: 29.00 GiB
Status: running Using processes: True

Scheduler Info

Scheduler

Scheduler-43e63244-a1f8-4d6e-ae5b-a5726f5ef3df

Comm: tcp://127.0.0.1:41341 Workers: 4
Dashboard: http://127.0.0.1:8787/status Total threads: 8
Started: Just now Total memory: 29.00 GiB

Workers

Worker: 0

Comm: tcp://127.0.0.1:42799 Total threads: 2
Dashboard: http://127.0.0.1:36831/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:38309
Local directory: /tmp/dask-worker-space/worker-dru4dx4n

Worker: 1

Comm: tcp://127.0.0.1:37255 Total threads: 2
Dashboard: http://127.0.0.1:35541/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:36009
Local directory: /tmp/dask-worker-space/worker-y85hl2ex

Worker: 2

Comm: tcp://127.0.0.1:40441 Total threads: 2
Dashboard: http://127.0.0.1:43233/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:41401
Local directory: /tmp/dask-worker-space/worker-ne_5nczt

Worker: 3

Comm: tcp://127.0.0.1:42629 Total threads: 2
Dashboard: http://127.0.0.1:37859/status Memory: 7.25 GiB
Nanny: tcp://127.0.0.1:42079
Local directory: /tmp/dask-worker-space/worker-tmdzbu6t
https://hub.eail.easi-eo.solutions/user/jhodge/proxy/8787/status
In [3]:
from datacube.utils.aws import configure_s3_access
configure_s3_access(aws_unsigned=False, requester_pays=True, client=client)
Out[3]:
<botocore.credentials.Credentials at 0x7f5183480400>
In [4]:
# Select a Product and Platform
product = "s2_l2a"
platform = "Sentinel-2A"

Define the Extents of the Analysis ▴¶

In [5]:
# NEW Yield Data from Vietnam (18-Nov-2022)

# lat_long = (10.443492, 105.281103) # 17, Chau Thanh, Yield High
# lat_long = (10.4172, 105.3635) # 28, Chau Thanh, Low High

# lat_long = (10.454342, 105.322838) #6, Chau Thanh, High Yield
# lat_long = (10.434116, 105.273150) #13, Chau Thanh, Low Yield
# lat_long = (10.392899, 105.188514) #37, Chau Thanh, High Yield
# lat_long = (10.394341, 105.126836) #47, Chau Thanh, Low Yield
# lat_long = (10.356519, 105.309450) #146, Chau Thanh, High Yield
# lat_long = (10.354744, 105.336739) #142, Chau Thanh, Low Yield

# box_size_deg = 0.0004 # Typically yields 5x5 pixel region

# Calculate the latitude and longitude bounds of the analysis box
# latitude = (lat_long[0]-box_size_deg/2, lat_long[0]+box_size_deg/2)
# longitude = (lat_long[1]-box_size_deg/2, lat_long[1]+box_size_deg/2)

latitude = easi.latitude
longitude = easi.longitude

# Define Time Range 
# The format of the time date is YYYY-MM-DD
start_date = '2022-04-01'
end_date = '2022-09-01'
time_extents = (start_date,end_date)
In [6]:
# The code below renders a map that can be used to view the region.
display_map(longitude,latitude)
/env/lib/python3.10/site-packages/dea_tools/plotting.py:313: FutureWarning: This function is deprecated. See: https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1
  all_longitude, all_latitude = transform(Proj(crs), Proj('EPSG:4326'), all_x,
Out[6]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Load bands needed for NDVI and remove clouds and water¶

In [7]:
dc = datacube.Datacube()
In [8]:
sentinel_dataset = dc.load(latitude = latitude,
                           longitude = longitude,
                           time = time_extents,
                           product = product,
                           group_by = 'solar_day',
                           measurements = ['red', 'nir', 'SCL'],
                           output_crs = 'EPSG:6933',
                           resolution = (-10,10),
                           dask_chunks = {'time':1})
In [9]:
# Filter data using SCL band classification

# scl=0 > No Data
# scl=1 > Saturated
# scl=3 > Cloud Shadows
# scl=6 > Water
# scl=8 > Cloud Medium Probability
# scl=9 > Cloud High Probability
# scl=10 > Thin Cirrus Cloud

cloud_mask = (sentinel_dataset.SCL != 0) & (sentinel_dataset.SCL != 1) & \
             (sentinel_dataset.SCL != 3) & (sentinel_dataset.SCL != 8) & \
             (sentinel_dataset.SCL != 9) & (sentinel_dataset.SCL != 10)

land_mask =  ((sentinel_dataset.SCL != 6) & cloud_mask)

# Drop the SCL data as it is no longer needed
sentinel_dataset = sentinel_dataset.drop('SCL')

# Apply land mask ... NO Clouds, NO Cloud Shadows and NO Water pixels
cleaned_dataset = sentinel_dataset.where(land_mask)

Define NDVI and add it to the dataset¶

In [10]:
def NDVI(dataset):
    return (dataset.nir - dataset.red)/(dataset.nir + dataset.red)
In [11]:
sentinel_dataset['NDVI'] = NDVI(sentinel_dataset)
In [12]:
cleaned_dataset['NDVI'] = NDVI(cleaned_dataset)
In [13]:
cleaned_dataset
Out[13]:
<xarray.Dataset>
Dimensions:      (time: 30, y: 1024, x: 966)
Coordinates:
  * time         (time) datetime64[ns] 2022-04-05T16:02:51 ... 2022-08-28T16:...
  * y            (y) float64 4.418e+06 4.418e+06 ... 4.408e+06 4.408e+06
  * x            (x) float64 -7.386e+06 -7.386e+06 ... -7.376e+06 -7.376e+06
    spatial_ref  int32 6933
Data variables:
    red          (time, y, x) float64 dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
    nir          (time, y, x) float64 dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
    NDVI         (time, y, x) float64 dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
Attributes:
    crs:           EPSG:6933
    grid_mapping:  spatial_ref
xarray.Dataset
    • time: 30
    • y: 1024
    • x: 966
    • time
      (time)
      datetime64[ns]
      2022-04-05T16:02:51 ... 2022-08-...
      units :
      seconds since 1970-01-01 00:00:00
      array(['2022-04-05T16:02:51.000000000', '2022-04-10T16:02:54.000000000',
             '2022-04-15T16:02:48.000000000', '2022-04-20T16:02:57.000000000',
             '2022-04-25T16:02:46.000000000', '2022-04-30T16:02:57.000000000',
             '2022-05-05T16:02:48.000000000', '2022-05-10T16:02:56.000000000',
             '2022-05-15T16:02:51.000000000', '2022-05-20T16:02:59.000000000',
             '2022-05-25T16:02:52.000000000', '2022-05-30T16:03:00.000000000',
             '2022-06-04T16:02:53.000000000', '2022-06-09T16:03:01.000000000',
             '2022-06-14T16:02:56.000000000', '2022-06-19T16:03:04.000000000',
             '2022-06-24T16:02:57.000000000', '2022-06-29T16:03:05.000000000',
             '2022-07-04T16:02:57.000000000', '2022-07-09T16:03:05.000000000',
             '2022-07-14T16:02:57.000000000', '2022-07-19T16:03:04.000000000',
             '2022-07-24T16:02:57.000000000', '2022-07-29T16:03:02.000000000',
             '2022-08-03T16:02:57.000000000', '2022-08-08T16:03:04.000000000',
             '2022-08-13T16:02:55.000000000', '2022-08-18T16:03:05.000000000',
             '2022-08-23T16:02:53.000000000', '2022-08-28T16:03:05.000000000'],
            dtype='datetime64[ns]')
    • y
      (y)
      float64
      4.418e+06 4.418e+06 ... 4.408e+06
      units :
      metre
      resolution :
      -10.0
      crs :
      EPSG:6933
      array([4418315., 4418305., 4418295., ..., 4408105., 4408095., 4408085.])
    • x
      (x)
      float64
      -7.386e+06 ... -7.376e+06
      units :
      metre
      resolution :
      10.0
      crs :
      EPSG:6933
      array([-7386025., -7386015., -7386005., ..., -7376395., -7376385., -7376375.])
    • spatial_ref
      ()
      int32
      6933
      spatial_ref :
      PROJCS["WGS 84 / NSIDC EASE-Grid 2.0 Global",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Cylindrical_Equal_Area"],PARAMETER["standard_parallel_1",30],PARAMETER["central_meridian",0],PARAMETER["false_easting",0],PARAMETER["false_northing",0],UNIT["metre",1,AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG","6933"]]
      grid_mapping_name :
      lambert_cylindrical_equal_area
      array(6933, dtype=int32)
    • red
      (time, y, x)
      float64
      dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
      units :
      1
      nodata :
      0
      crs :
      EPSG:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 226.41 MiB 7.55 MiB
      Shape (30, 1024, 966) (1, 1024, 966)
      Dask graph 30 chunks in 17 graph layers
      Data type float64 numpy.ndarray
      966 1024 30
    • nir
      (time, y, x)
      float64
      dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
      units :
      1
      nodata :
      0
      crs :
      EPSG:6933
      grid_mapping :
      spatial_ref
      Array Chunk
      Bytes 226.41 MiB 7.55 MiB
      Shape (30, 1024, 966) (1, 1024, 966)
      Dask graph 30 chunks in 17 graph layers
      Data type float64 numpy.ndarray
      966 1024 30
    • NDVI
      (time, y, x)
      float64
      dask.array<chunksize=(1, 1024, 966), meta=np.ndarray>
      Array Chunk
      Bytes 226.41 MiB 7.55 MiB
      Shape (30, 1024, 966) (1, 1024, 966)
      Dask graph 30 chunks in 23 graph layers
      Data type float64 numpy.ndarray
      966 1024 30
    • time
      PandasIndex
      PandasIndex(DatetimeIndex(['2022-04-05 16:02:51', '2022-04-10 16:02:54',
                     '2022-04-15 16:02:48', '2022-04-20 16:02:57',
                     '2022-04-25 16:02:46', '2022-04-30 16:02:57',
                     '2022-05-05 16:02:48', '2022-05-10 16:02:56',
                     '2022-05-15 16:02:51', '2022-05-20 16:02:59',
                     '2022-05-25 16:02:52', '2022-05-30 16:03:00',
                     '2022-06-04 16:02:53', '2022-06-09 16:03:01',
                     '2022-06-14 16:02:56', '2022-06-19 16:03:04',
                     '2022-06-24 16:02:57', '2022-06-29 16:03:05',
                     '2022-07-04 16:02:57', '2022-07-09 16:03:05',
                     '2022-07-14 16:02:57', '2022-07-19 16:03:04',
                     '2022-07-24 16:02:57', '2022-07-29 16:03:02',
                     '2022-08-03 16:02:57', '2022-08-08 16:03:04',
                     '2022-08-13 16:02:55', '2022-08-18 16:03:05',
                     '2022-08-23 16:02:53', '2022-08-28 16:03:05'],
                    dtype='datetime64[ns]', name='time', freq=None))
    • y
      PandasIndex
      PandasIndex(Float64Index([4418315.0, 4418305.0, 4418295.0, 4418285.0, 4418275.0, 4418265.0,
                    4418255.0, 4418245.0, 4418235.0, 4418225.0,
                    ...
                    4408175.0, 4408165.0, 4408155.0, 4408145.0, 4408135.0, 4408125.0,
                    4408115.0, 4408105.0, 4408095.0, 4408085.0],
                   dtype='float64', name='y', length=1024))
    • x
      PandasIndex
      PandasIndex(Float64Index([-7386025.0, -7386015.0, -7386005.0, -7385995.0, -7385985.0,
                    -7385975.0, -7385965.0, -7385955.0, -7385945.0, -7385935.0,
                    ...
                    -7376465.0, -7376455.0, -7376445.0, -7376435.0, -7376425.0,
                    -7376415.0, -7376405.0, -7376395.0, -7376385.0, -7376375.0],
                   dtype='float64', name='x', length=966))
  • crs :
    EPSG:6933
    grid_mapping :
    spatial_ref
In [14]:
# Plot the monthly time slice data in a table
import pandas as pd
pd.DataFrame({'time': cleaned_dataset.time.values})
Out[14]:
time
0 2022-04-05 16:02:51
1 2022-04-10 16:02:54
2 2022-04-15 16:02:48
3 2022-04-20 16:02:57
4 2022-04-25 16:02:46
5 2022-04-30 16:02:57
6 2022-05-05 16:02:48
7 2022-05-10 16:02:56
8 2022-05-15 16:02:51
9 2022-05-20 16:02:59
10 2022-05-25 16:02:52
11 2022-05-30 16:03:00
12 2022-06-04 16:02:53
13 2022-06-09 16:03:01
14 2022-06-14 16:02:56
15 2022-06-19 16:03:04
16 2022-06-24 16:02:57
17 2022-06-29 16:03:05
18 2022-07-04 16:02:57
19 2022-07-09 16:03:05
20 2022-07-14 16:02:57
21 2022-07-19 16:03:04
22 2022-07-24 16:02:57
23 2022-07-29 16:03:02
24 2022-08-03 16:02:57
25 2022-08-08 16:03:04
26 2022-08-13 16:02:55
27 2022-08-18 16:03:05
28 2022-08-23 16:02:53
29 2022-08-28 16:03:05

Plot NDVI vs Time¶

In [15]:
nanmask = np.any(np.isfinite(cleaned_dataset.NDVI), axis=(1,2))
 
plt.figure(figsize=(12, 6))
plt.plot(cleaned_dataset.time[nanmask],
         cleaned_dataset['NDVI'][nanmask].median(dim=['x','y']),
         color='red',marker='o')
plt.xlabel("Index")
plt.ylabel("NDVI")
plt.title("NDVI = Vegetation Index");
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
/env/lib/python3.10/site-packages/rasterio/warp.py:344: NotGeoreferencedWarning: Dataset has no geotransform, gcps, or rpcs. The identity matrix will be returned.
  _reproject(
In [16]:
# Output data to CSV
filename = "output.csv"
img3 = cleaned_dataset['NDVI']
img5 = img3.median(dim=['y','x'])
img5.to_dataframe().to_csv(filename)
In [ ]: